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ABSTRACT 

We  present  asymptotic  and  finite-sample  results  on  the  use  of  stochastic  blockmodels  for  the  analysis  of  network  data. 
We  show  that  the  fraction  of  misclassified  network  nodes  converges  in  probability  to  zero  under  maximum  likelihood 
fitting  when  the  number  of  classes  is  allowed  to  grow  as  the  root  of  the  network  size  and  the  average  network  degree 
grows  at  least  poly-logarithmically  in  this  size.  We  also  establish  finite-sample  confidence  bounds  on 
maximum-likelihood  blockmodel  parameter  estimates  from  data  comprising  independent  Bernoulli  random  variates; 
these  results  hold  uniformly  over  class  assignment.  We  provide  simulations  verifying  the  conditions  sufficient  for  our 
results,  and  conclude  by  fitting  a  logit  parameterization  of  a  stochastic  blockmodel  with  covariates  to  a  network  data 
example  comprising  a  collection  of  Facebook  profiles,  resulting  in  block  estimates  that  reveal  residual  structure. 
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Summary 

We  present  asymptotic  and  finite-sample  results  on  the  use  of  stochastic  blockmodels  for  the 
analysis  of  network  data.  We  show  that  the  fraction  of  misclassified  network  nodes  converges 
in  probability  to  zero  under  maximum  likelihood  fitting  when  the  number  of  classes  is  allowed 
to  grow  as  the  root  of  the  network  size  and  the  average  network  degree  grows  at  least  poly- 
logarithmically  in  this  size.  We  also  establish  finite-sample  confidence  bounds  on  maximum- 
likelihood  blockmodel  parameter  estimates  from  data  comprising  independent  Bernoulli  random 
variates;  these  results  hold  uniformly  over  class  assignment.  We  provide  simulations  verifying 
the  conditions  sufficient  for  our  results,  and  conclude  by  fitting  a  logit  parameterization  of  a 
stochastic  blockmodel  with  covariates  to  a  network  data  example  comprising  a  collection  of 
Facebook  profiles,  resulting  in  block  estimates  that  reveal  residual  structure. 

Some  key  words :  Likelihood-based  inference;  Social  network  analysis;  Sparse  random  graph;  Stochastic  blockmodel. 


1.  Introduction 

The  global  structure  of  social,  biological,  and  information  networks  is  sometimes  envisioned 
as  the  aggregate  of  many  local  interactions  whose  effects  propagate  in  ways  that  arc  not  yet 
well  understood.  There  is  increasing  opportunity  to  collect  data  on  an  appropriate  scale  for  such 
systems,  but  their  analysis  remains  challenging  (Goldenberg  et  al.,  2009).  Here  we  analyze  a 
statistical  model  for  network  data  known  as  the  (single-membership)  stochastic  blockmodel. 
Its  salient  feature  is  that  it  partitions  the  N  nodes  of  a  network  into  K  distinct  classes  whose 
members  all  interact  similarly  with  the  network.  Blockmodels  were  first  associated  with  the 
deterministic  concept  of  structural  equivalence  in  social  network  analysis  (Lorrain  &  White, 
1971),  where  two  nodes  were  considered  interchangeable  if  their  connections  were  equiva¬ 
lent  in  a  formal  sense.  This  concept  was  adapted  to  stochastic  settings  and  gave  rise  to  the 
stochastic  blockmodel  in  work  by  Holland  et  al.  (1983)  and  Fienberg  et  al.  (1985).  The  model 
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and  extensions  thereof  have  since  been  applied  in  a  variety  of  disciplines  (Wang  &  Wong, 
1987;  Nowicki  &  Snijders,  2001;  Girvan  &  Newman,  2002;  Airoldi  et  ah,  2005;  Doreian  et  ah, 
2005;  Newman,  2006;  Handcock  et  al.,  2007;  Hoff,  2008;  Airoldi  et  ah,  2008;  Copic  et  ah,  2009; 
Mariadassou  et  ah,  2010;  Karrer  &  Newman,  201 1). 

In  this  work  we  provide  a  finite-sample  confidence  bound  that  can  be  used  when  estimating 
network  structure  from  data  modeled  by  independent  Bernoulli  random  variates,  and  also  show 
that  under  maximum  likelihood  fitting  of  a  correctly  specified  A' -class  blockmodel,  the  fraction 
of  misclassified  network  nodes  converges  in  probability  to  zero  even  when  the  number  of  classes 
K  grows  with  N.  As  noted  by  Rohe  et  ah  (2011),  this  is  advantageous  if  we  expect  class  sizes  to 
remain  relatively  constant  even  as  N  increases.  Related  results  for  fixed  AT  have  been  shown  by 
Snijders  &  Nowicki  (1997)  for  networks  with  linearly  increasing  degree,  and  in  a  stronger  sense 
for  sparse  graphs  with  poly-logarithmically  increasing  degree  by  Bickel  &  Chen  (2009). 

Our  results  can  be  related  to  those  of  Rohe  et  ah  (2011),  who  use  spectral  methods  to  bound 
the  number  of  misclassified  nodes  in  the  stochastic  blockmodel  with  increasing  K ,  although 
with  the  more  restrictive  requirement  of  nearly  linearly  increasing  degree.  As  noted  by  those 
authors,  this  assumption  may  not  hold  in  many  practical  settings.  Our  manner  of  proof  requires 
only  poly-logarithmically  increasing  degree,  and  is  more  closely  related  to  the  fixed- A'  proof 
of  Bickel  &  Chen  (2009),  although  we  note  that  spectral  clustering  as  suggested  by  Rohe  et  ah 
(2011)  provides  a  computationally  appealing  alternative  to  maximum  likelihood  fitting  in  prac¬ 
tice. 

As  discussed  by  Bickel  &  Chen  (2009),  one  may  assume  exchangeability  in  lieu  of  a  genera¬ 
tive  A'-class  blockmodel:  An  analogue  to  de  Finetti’s  theorem  for  exchangeable  sequences  states 
that  the  probability  distribution  of  an  infinite  exchangeable  random  graph  is  expressible  as  a  mix¬ 
ture  of  distributions  whose  components  can  be  approximated  by  hlockmodels  (Kallenberg,  2005; 
Bickel  &  Chen,  2009).  An  observed  network  can  then  be  viewed  as  a  sample  drawn  from  this 
infinite  conceptual  population,  and  so  in  this  case  the  fitted  blockmodel  describes  one  mixture 
component  thereof. 


2.  Statement  of  results 
2-1.  Problem  formulation  and  definitions 

We  consider  likelihood-based  inference  for  independent  Bernoulli  data  {Aij}  (i  = 
1 ,N;j  =  i  +  1, . . . ,  N),  both  when  no  structure  linking  the  success  probabilities  { P,j }  is 
assumed,  as  well  as  the  special  case  when  a  stochastic  blockmodel  of  known  order  K  is  as¬ 
sumed  to  apply.  To  this  end,  let  A  £  {0, 1 } Ar/A  denote  the  symmetric  adjacency  matrix  of  a 
simple,  undirected  graph  on  N  nodes  whose  entries  { AtJ }  for  i  <  j  are  assumed  independent 
Bernoulli  (Pjj)  random  variates,  and  whose  main  diagonal  { A,t } A  .  ,  is  fixed  to  zero.  The  average 
degree  of  this  graph  is  2 M/N,  where  M  =  P%3  is  its  expected  number  of  edges.  Under  a 

A'-class  stochastic  blockmodel,  these  edge  probabilities  are  further  restricted  to  satisfy 


Pij  =  0ZiZj  (i  =  1, . . . ,  N;  j  =  i  +  1, . . . ,  N) 


(1) 


for  some  symmetric  matrix  0  £  [0,  l\KxK  and  membership  vector  2  £  {1 .... .  K }  A .  Thus  the 
probability  of  an  edge  between  two  nodes  is  assumed  to  depend  only  on  the  class  of  each  node. 
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Stochastic  blockmodels  3 

Let  L(A]  z.  9)  denote  the  log-likelihood  of  observing  data  matrix  A  under  a  //-class  block- 
model  with  parameters  (z.  9),  and  Lp(z,  9)  its  expectation: 

L(A;  z,9)  =  Y^  {A ij  log  6ziZj  +  (1  -  Aj)  log(l  -  9ZiZj)}  , 

i<j 

LP(z ,  0)  =  J2  {pij  l°g  0ZiZj  +  (1  -  Pij)  log(l  -  9ZiZj)}  . 
i<j 

For  fixed  class  assignment  z,  let  Na  denote  the  number  of  nodes  assigned  to  class  a,  and  let 
nab  denote  the  maximum  number  of  possible  edges  between  classes  a  and  6;  i.e.,  nab  =  NaNb  if 
a/i  and  naa  =  (Nf)-  Further,  let  (fz)  and  fM}  be  symmetric  matrices  in  [0,  \}K  xA ,  with 

0(ab  =  —  T2  Aij  Hzi  =  a,zj  =  b}  (a  =  1,  ,  . . ,  K-  b  =  a, . . . ,  K), 

nab 

K'J 

9(ab  =  — ~  pij  l{Zi  =  a,Zj  =  b}  (a  =  1, 5  =  a, ... ,  K) 

Uab  i<j 

defined  whenever  nab  /  0.  Observe  that  0iz^  comprises  sample  proportion  estimators  as  a  func¬ 
tion  of  2,  whereas  9^A  is  its  expectation  under  the  independent  {Bernoulli(Py)}  model.  Taken 
over  all  class  assignments  z  £  {1, . . .  ,K}N,  the  sets  {0iz) }  comprise  a  sufficient  statistic  for 
the  family  of  /f -cl ass  stochastic  blockmodels,  and  for  each  2,  0iz)  maximizes  L(A;  2,  •).  Analo¬ 
gously,  the  sets  {9^}  arc  functions  of  the  model  parameters  {Pij}L<j,  and  maximize  Lp(z.  •). 
We  write  9  and  9  when  the  choice  of  2  is  understood,  and  L(A;z )  and  Lp(z)  to  abbreviate 
supe  L(A\  2, 9)  and  supe  Lp(z,  9)  respectively. 

Finally,  observe  that  when  a  blockmodel  with  parameters  (2, 9)  is  in  force,  then  PVJ  =  9ZiZj  in 
accordance  with  (1),  and  consequently  Lp  is  maximized  by  the  true  parameter  values  (2, 9): 

Lp(z,9)  -  LP(z,9)  =  YJD{Pi,  ||  9ZiZj)  >  Y,2(pv  -  Kzj?  >  0, 

i<j  i<j 

where  I) (p  ||  p')  denotes  the  Kullback-Leibler  divergence  of  a  Bernoulli!// )  distribution  from 
a  Bernoulli (p)  one. 

2-2.  Fitting  a  K -class  stochastic  blockmodel  to  independent  Bernoulli  trials 

Fitting  a  //-class  stochastic  blockmodel  to  independent  Bernoulli dials  yields  estimates 
0-z>  of  averages  0tz'1  of  subsets  of  the  parameter  set  {Pij},  with  each  class  assignment  2  inducing 
a  partition  of  that  set.  We  begin  with  a  basic  lemma  that  expresses  the  difference  L(A;  2)  — 
LP(z)  in  terms  of  9iz  '1  and  0iz\  and  follows  directly  from  their  respective  maximizing  properties. 

LEMMA  1.  Let  { .4(  /  },V  ;  comprise  independent  Bernoulli  (Pij)  trials.  Then  the  difference 
sup,?  L(A;  z,  9)  -  sup0  LP(z ,  9)  can  be  expressed  for  X  =  J2i<j  Aij  l°g{^iZj/(l  -  0ZiZj)}  as 

L(A ;  2)  -  Lp{z)  =  Y,a<b  nabD(9ab  ||  9ab)  +  X  -  E(X). 

We  first  bound  the  former  quantity  in  this  expression,  which  provides  a  measure  of  the  distance 
between  9  and  its  estimand  9  under  the  setting  of  Lemma  1.  The  bound  is  used  in  subsequent 
asymptotic  results,  and  also  yields  a  kind  of  confidence  measure  on  9  in  the  finite-sample  regime. 

THEOREM  1.  Suppose  that  a  K -class  stochastic  blockmodel  is  fitted  to  data  { ; }  ,<  ;  com¬ 
prising  (A)  independent  Bernoulli  (Pij)  trials,  where,  for  any  class  assignment  z,  estimate  9 
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maximizes  the  blockmodel  log-likelihood  L(A ;  z,  •).  Then  with  probability  at  least  1  —  5, 

max  a<bnabD(dab  II  6>a6)}  <  N  log  K  +  (K2  +  K)  log  +  l)  +  log-jr.  (2) 

Theorem  1  is  proved  in  the  Appendix  via  the  method  of  types:  for  fixed  z,  the  probability  of 
any  realization  of  6  is  first  bounded  by  exp{—  Yla<bna.bD(0ab  ||  @ab) }•  A  counting  argument 
then  yields  a  deviation  result  in  terms  of  ( N/K  +  1)r2+k ,  and  finally  a  union  bound  is  applied 
so  that  the  result  holds  uniformly  over  all  KN  possible  choices  of  assignment  vector  z. 

Our  second  result  is  asymptotic,  and  combines  Theorem  1  with  a  Bernstein  inequality  for 
bounded  random  variables,  applied  to  the  latter  terms  X  —  E(X)  in  Lemma  1.  To  ensure  bound¬ 
edness  we  assume  minimal  restrictions  on  each  PVJ ;  this  Bernstein  inequality,  coupled  with  a 
union  bound  to  ensure  that  the  result  holds  uniformly  over  all  z,  dictates  growth  restrictions  on 
K  and  M. 

THEOREM  2.  Assume  the  setting  of  Theorem  1,  whereby  a  I\  -class  blockmodel  is  fitted  to 
(^)  independent  Bernoulli  random  variates  and  further  assume  that  1/N 2  < 

Pij  <1-1  /N2  for  all  N  and  i  <  j.  Then  if  K  =  0{N1/2)  and  M  =  uj(N (log  N)3+s)  for 
some  5  >  0, 

max  | L(A;  z )  —  Lp(z)  |  =  op(M). 

Z 

Thus  whenever  each  Pj3  is  bounded  away  from  0  and  1  in  the  manner  above,  the  maximized  log- 
likelihood  function  L(A\  z)  =  sup0  L(A:  z,  0)  is  asymptotically  well  behaved  in  network  size  N 
as  long  as  the  network’s  average  degree  2 M/N  grows  faster  than  (log  N)3+s  and  the  number  K 
of  classes  fitted  to  it  grows  no  faster  than  N1/2. 

2-3.  Fitting  a  correctly  specified  K-class  stochastic  blockmodel 

The  above  results  apply  to  the  general  case  of  independent  Bernoulli  data  {Al3},  with  no  addi¬ 
tional  structure  assumed  amongst  the  set  of  success  probabilities  {Pij}',  if  we  further  assume  the 
data  to  be  generated  by  a  /v'-class  stochastic  blockmodel  whose  parameters  (z.  0)  are  subject  to 
suitable  identifiability  conditions,  it  is  possible  to  characterize  the  behavior  of  the  class  assign¬ 
ment  estimator  2  under  maximum  likelihood  fitting  of  a  correctly  specified  IT -class  blockmodel. 

THEOREM  3.  If  the  conclusion  maxz  |  L(A]  z)  —  Lp{z)  \  =  op(M)  of  Theorem  2  holds,  and 
data  are  generated  according  to  a  K-class  blockmodel  with  membership  vector  z,  then 

LP{z)  -  Lp(z)  =  oP(M),  (3) 

with  respect  to  the  maximum-likelihood  K-class  blockmodel  class  assignment  estimator  z. 

Let  Ne(z)  be  the  number  of  incorrect  class  assignments  under  z,  counted  for  every  node 
whose  true  class  under  z  is  not  in  the  majority  within  its  estimated  class  under  z.  If  furthermore 
the  following  identifiability  conditions  hold  with  respect  to  the  model  sequence: 

If)  for  all  blockmodel  classes  a  =  1, . . . ,  K,  class  size  Na  grows  as  mina{Ara}  =  f l(N/K); 

(ii)  the  following  holds  over  all  distinct  class  pairs  (a,  b)  and  all  classes  c: 

f  T-if  Q  II  ^ ac  \  tv(  a  II  ^ ac  ^ be  \  j  n  (  MK  ^ 

mm  max{D(0„c  ||  — j— )  +»(<L  II  — j—  J  }  =  fi(l^)' 
then  it  follows  from  (3)  that  Ne(z)  =  op(N). 

Thus  the  conclusion  of  Theorem  3  is  that  under  suitable  conditions  the  fraction  Ne/N  of 
misclassified  nodes  goes  to  zero  in  N,  yielding  a  convergence  result  for  stochastic  blockmodels 
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with  growing  number  of  classes.  Condition  (i)  stipulates  that  all  class  sizes  grow  a  rate  that  is 
eventually  bounded  below  by  a  single  constant  times  N/K,  while  condition  (ii)  ensures  that 
any  two  rows  of  8  differ  in  at  least  one  entry  by  an  amount  that  is  eventually  bounded  by  a 
single  constant  times  MK/N 2 .  Observe  that  if  eventually  K  =  TV1/2  and  M  =  IV(loglV)4  so 
that  conditions  on  K  and  M  sufficient  for  Theorem  2  arc  met,  then  since  (log iV)4  =  o(Ar'^2), 
it  follows  that  MK/N2  goes  to  zero  in  N. 


3.  Numerical  results 

We  now  present  results  of  a  small  simulation  study  undertaken  to  investigate  the  assumptions 
and  conditions  of  Theorems  1-3  above,  in  which  A'-class  blockmodels  were  fitted  to  various 
networks  generated  at  random  from  models  corresponding  to  each  of  the  three  theorems.  Be¬ 
cause  exact  maximization  in  2  of  the  blockmodel  log-likelihood  L(A:  z,  8)  is  computationally 
intractable  even  for  moderate  N ,  we  instead  employed  Gibbs  sampling  to  explore  the  function 
max#  L(A:  z.  6)  and  recorded  the  best  value  of  z  visited  by  the  sampler.  As  the  results  of  Theo¬ 
rems  1  and  2  hold  uniformly  in  z,  however,  we  expect  8  and  Lp(z)  to  be  close  to  their  empirical 
estimates  whenever  N  is  sufficiently  large,  regardless  of  the  approach  employed  to  select  z.  This 
fact  also  suggests  that  a  single-class  (Erdos-Renyi)  blockmodel  may  come  closest  to  achieving 
equality  in  Theorems  1  and  2,  as  many  class  assignments  are  equally  likely  a  priori  to  have  high 
likelihood.  By  similar  reasoning,  a  weakly  identifiable  model  should  come  closest  to  achieving 
the  error  bound  in  Theorem  3,  such  as  one  with  nearly  identical  within-  and  between-class  edge 
probabilities.  We  describe  each  of  these  cases  empirically  in  the  remainder  of  this  section. 

First,  the  tightness  of  the  confidence  bound  of  (2)  from  Theorem  1  was  investigated  by  fit¬ 
ting  A'-class  blockmodels  to  Erdos-Renyi  networks  comprising  ( /  )  independent  Bernoulli (p) 
trials,  with  N  =  500  nodes  and  p  =  0-075  chosen  to  match  the  data  analysis  example  in  the 
sequel,  and  K  £  {5, 10,  20,  30, 40,  50}.  For  each  K,  the  error  terms  Yla<b  nabD(8ab  ||  8ab)  and 

{/>2a<h  nah ( 8 ah  —  8 ah)'2 }  '^2  were  recorded  for  each  of  100  trials  and  compared  to  the  respective 
95%  confidence  bounds  (5  =  0-05)  derived  from  Theorem  1.  The  bounds  overestimated  the  re¬ 
spective  errors  by  a  factor  of  3  to  7  on  average,  with  small  standard  deviation.  In  this  worst-case 
scenario  the  bound  is  loose,  but  not  unusable;  the  errors  never  exceeded  the  95%  confidence 
bounds  in  any  of  the  trials. 

To  test  whether  the  assumptions  of  Theorem  2  are  necessary  as  well  as  sufficient  to  obtain 
convergence  of  L(A;  z) /M  to  Lp(z) /M,  blockmodels  were  next  fitted  to  Erdos-Renyi  networks 
of  increasing  size,  for  N  in  the  range  50-1050.  The  corresponding  normalized  log-likelihood 
error  | L(A;  z)  —  Lp(z)\/M  for  different  rates  of  growth  in  the  expected  number  of  edges  M 
and  the  number  of  fitted  classes  K  is  shown  in  Fig.  1.  Observe  from  the  leftmost  panel  that  when 
M  =  -/V(logiV)4  and  K  =  N1/2,  as  prescribed  by  the  theorem,  this  error  decreases  in  N.  If  the 
edge  density  is  reduced  to  M/N  =  (log  N)2,  we  observe  in  the  center  panel  convergence  when 
K  =  N1/1  and  divergence  when  K  =  N3^’.  This  suggests  that  the  error  as  a  function  of  K 
follows  Theorem  2  closely,  but  that  the  network  can  be  somewhat  more  sparse  than  it  requires. 

To  test  the  conditions  of  Theorem  3,  blockmodels  with  parameters  (z,  8)  and  increasing  class 
size  K  were  used  to  generate  data,  and  corresponding  node  misclassification  error  rates  Ne(z) /N 
were  recorded  as  a  function  of  correctly  specified  A'-class  blockmodel  fitting.  Model  parameter  z 
was  chosen  to  yield  equally-sized  blocks,  so  as  to  meet  identifiability  condition  (i)  of  Theorem  3. 
Parameter  0  =  al  +  All1  was  chosen  to  yield  within-class  and  between-class  success  proba¬ 
bilities  with  the  property  that  for  any  class  pair  (a,  h),  the  condition  D(8aa  ||  (8 an  +  8ab)/ 2)  = 
M A'7/(20Ar2)  was  satisfied,  with  7  €  {4/5,  9/10, 1};  identifiability  condition  (ii)  was  thus  met 
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Fig.  1.  Simulation  study  results  illustrating  Theorems  1-3.  Left:  Likelihood  error  \L(  A;  z)  —  Lp(z)\/M  asafunction 
of  network  size  A,  shown  for  M  =  A  (log  A )4  with  K  =  A1/2.  Center:  Same  quantity  for  M  =  A  (log  A)2  with 
K  =  A3/5  (dotted)  and  K  =  A1/2  (solid).  Right:  Error  rate  Ae(z)/A  for  M  =  A  (log  A)2  with  K  =  A1/2  and 

7  =  4/5  (dotted),  7  =  9/10  (dashed),  7=1  (solid) 


only  in  the  7  =  1  case.  The  rightmost  panel  of  Fig.  1  shows  the  fraction  Ne(z) /N  of  misclassified 
nodes  when  M  =  N (log  N )2  and  K  =  iV1/2,  corresponding  to  the  setting  in  which  convergence 
of  L(A;  z)/M  to  Lp(z) /M  was  observed  above;  this  fraction  is  seen  to  decay  when  7  =  1  or 
9/10,  but  to  increase  when  7  =  4/5.  This  behavior  conforms  with  Theorem  3  and  suggests  that 
its  identifi  ability  conditions  are  close  to  being  necessary  as  well  as  sufficient. 


4.  Network  data  example 
4- 1.  Facebook  social  network  dataset 

To  illustrate  the  use  of  our  results  in  the  fitting  of  K -class  stochastic  blockmod- 
els  to  network  data,  we  employed  a  publicly  available  social  network  dataset  contain¬ 
ing  N  =  553  undergraduate  Facebook  profiles  from  the  California  Institute  of  Technology 
(people.maths.ox.ac.uk/~porterm/data/facebook5.zip).  These  profiles  indicate  whenever  a  pair 
of  students  have  identified  one  another  as  friends,  yielding  a  network  of  11  511  edges  and  ac¬ 
companying  covariate  information  including  gender,  class  year,  and  hall  of  residence. 

Traud  et  al.  (2011)  applied  community  detection  algorithms  to  this  network,  and  compared 
their  output  to  partitions  based  on  categorical  covariates  such  as  those  identified  above.  They 
concludes  that  a  grouping  of  students  by  residence  hall  was  most  similar  to  the  best  algorithmic 
grouping  obtained,  and  thus  that  shared  residence  hall  membership  was  the  best  predictor  for 
the  formation  of  community  structure.  This  structure  is  reflected  in  the  leftmost  panel  of  Fig.  2, 
which  shows  the  network  adjacency  structure  under  an  ordering  of  students  by  residence  hall. 

4-2.  Logit  blockmodel  parameterization  and  fitting  procedure 

Here  we  build  on  the  results  of  Traud  et  al.  (201 1)  by  taking  covariate  information  explicitly 
into  account  when  fitting  the  Facebook  dataset  described  above.  Specifically,  by  assuming  only 
that  links  are  independent  Bernoulli  variates  and  then  employing  confidence  bounds  to  assess 
fitted  blocks  by  way  of  parameter  8(zK  we  examine  these  data  for  residual  community  structure 
beyond  that  well  explained  by  the  covariates  themselves. 

Since  the  results  of  Theorems  1  and  2  hold  uniformly  over  all  choices  of  blockmodel  mem¬ 
bership  vector  z,  we  may  select  7  in  any  manner,  including  those  that  depend  on  covariates. 
For  this  example,  we  determined  an  approximate  maximum  likelihood  estimate  z  under  a  logit 
blockmodel  that  allows  the  direct  incorporation  of  covariates.  The  model  is  parameterized  such 
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Fig.  2.  Facebook  social  network  dataset  and  its  fitting  statistics  for  varying  number  of  blockmodel  classes  K.  Left: 
Adjacency  data  matrix  of  a  network  of  Facebook  undergraduate  student  profiles.  Center:  Model  order  statistic  for 
fitted  logit  blockmodels  as  a  function  of  K.  Right:  Out-of-sample  prediction  error  as  a  function  of  K 


that  the  log-odds  ratio  of  an  edge  occurrence  between  nodes  i  and  j  is  given  by 

log  -  Fl>p  =  0Zizj  +  x(i,j)T0  (i  =  1,. . . , N;j  =  i  +  1,...  ,N),  (4) 

1  -  Pij 

where  x(i,j)  a  vector  of  covariates  indicating  shared  group  membership,  and  model  parame¬ 
ters  ( 6 ,  0,  z)  are  estimated  from  the  data.  Four  categorical  covariates  were  used:  the  three  indi¬ 
cated  above,  plus  an  eight-category  covariate  indicating  the  range  of  the  observed  degree  of  each 
node;  see  Karrer  &  Newman  (2011)  for  related  discussion  on  this  point.  Matrix  6  is  analogous  to 
blockmodel  parameter  6,  vector  z  specifies  the  blockmodel  class  assignment,  and  vector  (5  was 
implemented  here  with  sum-to-zero  identifiability  constraints. 

Because  exact  maximization  of  the  log-likelihood  function  L(A ;  6,  0,  z)  corresponding  to  (4) 
is  computationally  intractable,  we  instead  employed  an  approach  that  alternated  between  Markov 
chain  Monte  Carlo  exploration  of  z  while  holding  (6,0)  constant,  and  optimization  of  6  and  0 
while  holding  z  constant.  We  tested  different  initialization  methods  and  observed  that  highest 
likelihoods  were  consistently  produced  by  first  fitting  class  assignment  vector  z.  This  fitting 
procedure  provides  a  means  of  estimating  averages  6iz'>  over  subsets  of  the  set  under 

the  assumption  that  the  network  data  comprise  independent  Bernoulli^,,)  trials. 

4-3.  Data  analysis 

We  fitted  the  logit  blockmodel  of  (4)  for  values  of  K  ranging  from  1  to  50  using  the  stochas¬ 
tic  maximization  procedure  described  in  the  preceding  paragraph,  and  gauged  model  order  by 
the  Bayesian  information  criterion  and  out-of-sample  prediction  using  five-fold  cross  validation, 
shown  respectively  in  the  center  and  rightmost  panels  of  Fig.  2.  These  plots  suggest  a  rela¬ 
tively  low  model  order,  beginning  around  K  =  4.  The  corresponding  95%  confidence  bounds 
on  the  divergence  of  6^z)  from  6^z>  provided  by  Theorem  1  also  yield  small  values  for  K 
in  the  range  4-7:  for  example,  when  K  =  5,  the  normalized  sum  of  Kullback-Leibler  diver¬ 
gences  (N-))~1^Za<bnabP>(^ab  II  6 ah)  is  bounded  by  0-0067.  Corresponding  normalized  root- 
mean-square  error  bounds  over  this  range  of  K  are  approximately  one  order  of  magnitude  larger. 

We  then  examined  approximate  maximum  likelihood  estimates  of  z  for  K  in  the  range  4-7, 
as  shown  in  the  top  two  rows  of  Fig.  3;  larger  values  of  K  also  reveal  block  structure,  but  exhibit 
correspondingly  larger  confidence  bound  evaluations.  The  permuted  adjacency  structures  under 
each  estimated  class  assignment  z  are  shown  in  the  top  row,  along  with  the  corresponding  values 
of  6  below  in  the  second  row.  The  structure  of  6  over  this  range  of  K  suggests  that  after  covariates 
are  taken  into  account,  it  is  possible  to  identify  a  subset  of  students  who  divide  naturally  into 
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Fig.  3.  Results  of  logit  blockmodel  fitting  to  the  data  of  Fig.  2  for  each  of  K  £  {4,  5,  6,  7}  classes.  Top  row:  Adja¬ 
cency  structure  of  the  data,  permuted  to  show  block  assignments  for  K  £  {4,  5, 6,  7}.  Second  row:  Corresponding 
estimates  9,  with  Kullback-Leibler  divergence  bounds  0-0057,  0-0067,  0-0077,  and  0-0086.  Bottom  row:  Residence 
hall  assignments  of  students  whose  grouping  remained  constant  over  these  four  values  of  K 


two  residual  “meta-groups”  that  interact  less  frequently  with  one  another  in  comparison  to  the 
remaining  subjects  in  the  dataset;  the  precision  of  the  corresponding  estimates  8  can  be  quantified 
by  Theorem  1,  as  in  the  caption  of  Fig.  3. 

As  K  increases,  these  groups  become  more  tightly  concentrated,  as  extra  blocks  absorb  stu¬ 
dents  whose  connections  are  more  evenly  distributed.  While  the  exact  membership  of  each  group 
varied  over  K,  in  part  due  to  stochasticity  in  the  fitting  algorithm  employed,  we  observed  199 
students  whose  meta-group  membership  remained  constant.  The  bottom  row  of  Fig.  3  shows  the 
8  residence  halls  identified  for  these  sets  of  students,  with  the  ninth  category  indicating  unre¬ 
ported;  observe  that  the  effect  of  residence  hall  is  still  visible  in  that  the  left-hand  grouping  has 
more  students  in  halls  4-7,  while  the  right-hand  grouping  has  more  students  in  halls  1,  2,  and  8. 
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Appendix 

Proofs  of  Theorems  1  and  2 

Proof  of  Theorem  1.  To  begin,  observe  that  for  any  fixed  class  assignment  z,  every  9ab  is  a 
sum  of  nab  independent  Bernoulli  random  variables,  with  corresponding  mean  0ah-  A  Chernoff 
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bound  (Dubhashi  &  Panconesi,  2009)  shows 

pr  (dab  >  eab  +  t)<  e~nabD{Sab+t^Sab) ,  0  <t<  l- 0ab 
pr (9ab  <  9ab  -t)<  e-^OU-41^  0  <  t  <  9ab. 

Since  these  bounds  also  hold  respectively  for  pr(9ab  =  9ab  ±  t),  we  may  bound  the  probability  of  any 
given  realization  t)  G  {0, 1  / nab, . . . ,  1}  of  9ab  in  terms  of  the  Kullback-Leibler  divergence  of  9ab  from  i): 

pr (9ab  =  'd)  <  e-n“bDmS°-b) . 


By  independence  of  the  {Aij}i<j,  this  implies  a  corresponding  bound  on  the  probability  of  any  9: 


pi-(0)  <  exp  |  —  Ea<b  nabD(dab  II  6»ab)j  . 


(Al) 


Now,  let  0  denote  the  range  of  9  for  fixed  z,  and  observe  that  since  each  of  the  ( K  .]  1 )  lower-diagonal 
entries  {9ab}a<b  of  9  can  independently  take  on  nab  +  1  distinct  values,  we  have  that  |0|  =  Yla<b(nab  + 
1).  Subject  to  the  constraint  that  EQ<b  nab  =  (A),  we  see  that  this  quantity  is  maximized  when  nab  = 
('2)  /  (A^1)  for  all  a  <  b,  and  hence 


r  ilK+1)  k2  +  k 

|0|  <  (A)/(A2+1)  +1  2  <  {N2/I<2  +  1)  2  <  {N/K  +  1) 


K2+K 


(A2) 


Now  consider  the  event  that  Yla<b  nabD(9ab  \  \  9ab)  is  at  least  as  large  as  some  e  >  0;  the  probability 
of  this  event  is  given  by  pr(0£)  for 

0£  =  {0  G  0  :  E a<bnabD{9ab  ||  9ab )  >  e}  .  (A3) 

Since  E a<b  nabD(9ab  ||  9ab)  >  e  for  all  9  G  0£,  we  have  from  (Al)  and  (A3)  that 


pr(0£)  =  pr(fl)  <  e 


—  J2a<b  nabD0ab\\9ab) 


^  E 


=  l©£ 


and  since  |0£|  <  |0|,  we  may  use  (A2)  to  obtain,  for  fixed  class  assignment  z, 

Pr  {Eo<6  nabD(9  ||  0)  >  e}  <  (N/K  +  lf2+K  e"6.  (A4) 

Appealing  to  a  union  bound  over  all  K v  possible  class  assignments  and  setting  e  = 

log[A'iV  (N / K  +  1) K  +K  / <5]  then  yields  the  claimed  result.  □ 

Proof  of  Theorem  2.  By  Lemma  1,  the  difference  L(A\z)  —  Lp(z)  can  be  expressed  for  any  fixed 
class  assignment  2  as  E a<bnab^^ab  II  ®ab)  +  —  E(X),  where  the  first  term  satisfies  the  deviation 

bound  of  (A4),  and  X  =  Ei<j  log{^ZiZj/(l  —  9ZiZ;j)}  comprises  a  weighted  sum  of  independent 
Bernoulli(Pjj)  random  variables. 

To  bound  the  quantity  \X  —  E(X)\,  observe  that  since  by  assumption  iV~2  <  Pij  <  1  -  N~2, 

the  same  is  true  for  each  corresponding  average  9ZiZj.  As  a  result,  the  random  variables  X,:)  = 

Aij  \og{9 ZiZ. / (\  —  9ZiZj)}  comprising  X  are  each  bounded  in  magnitude  by  C  =  2  log N.  This  allows 
us  to  apply  a  Bernstein  inequality  for  sums  of  bounded  independent  random  variables  due  to  Chung  &  Lu 
(2006,  Theorems  2.8  and  2.9,  p.  27),  which  states  that  for  any  e  >  0, 


2E 


i<j 


E(X2) 


(2/3  )eC 


(A5) 


pr{|X  —  E(X) |  >  e}  <  2 exp 


10 


D.  S.  Choi,  P.  J.  Wolfe  and  E.  M.  Airoldi 


433 

434 

435 

436 

437 

438 

439 

440 

441 

442 

443 

444 

445 

446 

447 

448 

449 

450 

451 

452 

453 

454 

455 

456 

457 

458 

459 

460 

461 

462 

463 

464 

465 

466 

467 

468 

469 

470 

471 

472 

473 

474 

475 

476 

477 

478 

479 

480 


Finally,  observe  that  since  the  event  \L(Am,  z)  —  Lp(z)\  >  2eM  implies  either  the  event 
^2a<b  nabD{9ab  \  \  0ab )  >  eM  or  the  event  \X  —  E(X)  \  >  eM,  we  have  for  fixed  assignment  z  that 

pr{| L{A-z)  -  Lp(z)  >  2 eM}  <  pr  [{  Ea<b  nabD{9ab  ||  9ab)  >  eM}  U  {|A  -  E{X)\  >  eM}  . 

Summing  the  right-hand  sides  of  ( A4)  and  (A5),  and  then  over  all  KN  possible  assignments,  yields 

pr{max \L(A-  z)  —  Lp(z)\  >  2 eM}  <  exp  {K  log  N  +  ( K 2  +  K)  \og(N/K  +  1)  —  eM } 

f  e2M  1 

+  2  exp  l  K  log  N  - - 5 - . 

I  8  log2  N  +  (4/3)elog./V  J 

where  we  have  used  the  fact  that  .  E(Xfj)  <  4Mlog2  N  in  (A5).  It  follows  directly  that  if  I\  = 
0(N 1//2)  and  M  =  oj(N(\ogN)3+s),  then  limjv^oo  pr{maxz  \L(A;  z)  —  Lp{z)\/M  >  e}  =  0  for  ev¬ 
ery  fixed  e  >  0  as  claimed.  □ 

Proof  of  Theorem  3 

Proof  of  Theorem  3.  To  begin,  note  that  Theorem  2  holds  uniformly  in  z,  and  thus  implies  that 

\LP(z)  -  L(A ■  z) |  +  | LP(z)  -  L{A\  z) \  =  oP(M). 

Since  5  is  the  maximum-likelihood  estimate  of  class  assignment  2,  we  know  that  L{A\  z)  >  L(A;  z ), 
implying  that  L(A\  z)  =  L(A-,  z)  +  S  for  some  5  >  0.  Thus,  by  the  triangle  inequality, 

I LP(z)  -  LP(z)  +  <5 1  <  | LP(z)  -  L(A;  z)\  +  \LP(z)  -  {L(A;  z)  +  5)|  =  oP(M), 

and  since  Lp(z)  >  Lp{z)  under  any  blockmodel  with  parameter  z,  we  have  Lp{z)  —  Lp(z)  =  op(M). 
Under  conditions  (i)  and  (ii)  of  Theorem  3,  we  will  now  show  that  also 

N  (z) 

Lp(z)~Lp(z)  =  ^n(M),  (A6) 

holds  for  every  realization  of  z,  thus  implying  that  Ne(z)  =  op{N)  and  proving  the  theorem. 

To  show  (A6),  first  observe  that  any  blockmodel  class  assignment  vector  z  induces  a  corresponding 
partition  of  the  set  {P%3}i<j  according  to  (i,j)  >->•  (zi,  Zj).  Formally,  z  partitions  {P%j}i<j  into  L  subsets 
(Si , . . . ,  Sl)  via  the  mapping 

Cij  '■  (*  =  1)  •  •  •  j  AT;  j  =  i  +  1,  •  •  • ,  N)  — >  (l  =  1, . . . ,  L). 

This  partition  is  separable  in  the  sense  that  there  exists  a  bijection  between  {1, . . . ,  L}  and  the  upper 
triangular  portion  of  blockmodel  parameter  9 ,  such  that  we  write  9q.  =  9ZiZj  for  membership  vector  z. 
More  generally,  for  any  partition  II  of  {Pij}i<j,  we  may  define  9i  =  |Si|_1  Pij  l{Pij  €  S/}asthe 
arithmetic  average  over  all  Pij  in  the  subset  Si  indexed  by  Qj  =  l.  Thus  we  may  also  define 

L*p( n)  =  5]  {Pi3  log 9Cij  +  (1  -  Ptj)  log(l  -  9Cij )}  , 

i<j 

so  that  L*P  and  Lp  coincide  on  partitions  corresponding  to  admissible  blockmodel  assignments  z. 

The  establishment  of  (A6)  proceeds  in  three  steps:  first,  we  construct  and  analyze  a  refinement  of  the 
partition  IP  induced  by  any  blockmodel  assignment  vector  z  in  terms  of  its  error  Are(z);  then,  we  show 
that  refinements  increase  L*pf)\  finally,  we  apply  these  results  to  the  maximum-likelihood  estimate  z. 

Lemma  2.  Consider  a  K -class  stochastic  blockmodel  with  membership  vector  z,  and  let  Tlz  denote 
the  partition  of  its  associated  {Pij}i<i<j<N  induced  by  any  z  £  (1, . . . ,  I\}N.  For  every  TF,  there  exists 
a  partition  II*  that  refines  TF  and  with  the  property  that,  if  conditions  (i)  and  (ii)  of  Theorem  3  hold, 

LP(z)-L*P(u*)  =  ^ln(M), 


(A7) 
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where  Ne(z)  counts  the  number  of  nodes  whose  true  class  assignments  under  z  are  not  in  the  majority 
within  their  respective  class  assignments  under  z. 

LEMMA  3.  Let  IL  be  a  refinement  of  any  partition  II  of  the  set  {Pij}i<j>  then  L*p(Jl'  )  >  L*P( n). 

Since  Lemma  2  applies  to  any  admissible  blockmodel  assignment  vector  z,  it  also  applies  to  the 
maximum-likelihood  estimate  z  for  any  realization  of  the  data;  each  z  in  turn  induces  a  partition 
IL  of  blockmodel  edge  probabilities  { P:J  },<r  and  (A7)  holds  with  respect  to  its  refinement  II*.  By 
Lemma  3,  -Lp(IL)  <  Zp(II*).  Finally,  observe  that  Lp{z)  =  ZZ(IF)  by  the  definition  of  L*P,  and  so 
Lp{z)  —  Lp(z)  >  Lp{z)  —  Zjj(n*),  thereby  establishing  (A6).  □ 


Proof  of  Lemma  2.  The  construction  of  II*  will  take  several  steps.  For  a  given  membership  class  under 
z,  partition  the  corresponding  set  of  nodes  into  subclasses  according  to  the  true  class  assignment  z  of  each 
node.  Then  remove  one  node  from  each  of  the  two  largest  subclasses  so  obtained,  and  group  them  together 
as  a  pair;  continue  this  pairing  process  until  no  more  than  one  nonempty  subclass  remains,  then  terminate. 
Observe  that  if  we  denote  pairs  by  their  node  indices  as  (i,  j),  then  by  construction  Zi  =  Zj  but  2,  ^  zj. 

Repeat  the  above  procedure  for  each  class  under  z,  and  let  C\  denote  the  total  number  of  pairs  thus 
formed.  For  each  of  the  C\  pairs  find  all  other  distinct  indices  k  for  which  the  following  holds: 


D  ^ Pik 


LlLEhl)  +  D(plt  ||  >  cPf 


(A8) 


where  C  is  the  constant  from  condition  fii)  of  Theorem  3,  and  indices  ik  and  jk  in  (A8)  are  to  be  in¬ 
terpreted  respectively  as  ki  whenever  k  <  i,  and  kj  whenever  k  <  j.  Let  C2  denote  the  total  number  of 
distinct  triples  that  can  be  formed  in  this  manner. 

We  are  now  ready  to  construct  the  partition  II*  of  the  probabilities  {Pij}i<i<j<N  as  follows:  For  each 
of  the  C2  triples  (i,  j ,  k),  remove  Ptk  (or  Pki  if  k  <  i)  and  Pjk  (or  Pkj )  from  their  previous  subset  assign¬ 
ment  under  IF,  and  place  them  both  in  a  new,  distinct  two-element  subset.  We  observe  the  following: 

(i)  The  partition  II*  is  a  refinement  of  the  partition  IF  induced  by  z:  Since  nodes  i  and  j  have  the  same 
class  label  under  z  in  that  2,  =  Zj,  it  follows  that  for  any  k,  Pik  and  Pjk  are  in  the  same  subset  under  II2. 

(ii)  Since  for  each  class  at  most  one  nonempty  subclass  remains  after  the  pairing  process,  the  number  of 
pairs  is  at  least  half  the  number  of  misclassifications  in  that  class.  Therefore  we  conclude  C\  >  Ne(z)/ 2. 

(iii)  Condition  (ii)  of  Theorem  3  implies  that  for  every  pair  of  classes  (a,  b),  there  exists  at  least  one 
class  c  for  which  (A8)  holds  eventually.  Thus  eventually,  for  any  of  the  C\  pairs  (i,  j),  we  obtain  a  number 
of  triples  at  least  as  large  as  the  cardinality  of  class  c.  Condition  (i)  in  turn  implies  that  the  cardinality  of 
the  smallest  class  grows  as  Q(N/I\ ),  and  thus  we  may  write  C2  =  C\  fl{N/K). 

We  can  now  express  the  difference  Lp{z)  —  ZZ(II*)  as  a  sum  of  nonnegative  divergences  D(Pij  || 
Oq  ),  where  £*,  is  the  assignment  mapping  associated  to  II*,  and  use  (A8)  to  lower-bound  this  difference: 


Lp(z)  -  L*p( II*)  =  X]  D(PtJ  ||  eQ.)  =  c2  n(^) 

i<j 


□ 


Proof  of  Lemma  3.  Let  II'  be  a  refinement  of  any  partition  II  of  the  set  ( L\3  },;<■/,  and  given  a  G 
{1, . . . ,  L'}  indexing  S'a,  let  F(a)  denote  its  index  under  II.  We  show  that  Zp(lT)  >  Zp(II)  as  follows: 

L' 

Zp(n')  =  £  \s'a\{e'a\og~e'a  +  (i  -  ?o)i0g(i  -  e'a)} 

a=  1 
L' 

>  E  l^l{Clog^(„)  +  (1  -  0'a)  log(l  -  0F(a))} 

a=  1 
L 

=  E  \sb\{ebiogdb  +  (i  -  eb)  iog(i  -  e6)}  =  L*P(u), 

b=  1 
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where  the  first  inequality  holds  by  nonnegativity  of  Kullback-Leibler  divergence,  and  the  second  equality 
follows  from  the  fact  that  IT  is  a  refinement  of  II.  □ 


References 

Airoldi,  E.,  Blei.  D.,  Xing,  E.  &  Fienberg,  S.  (2005).  A  latent  mixed  membership  model  for  relational  data. 

In  Proc.  3rd  Inti  Worksh.  Link  Discovery.  New  York:  Association  for  Computing  Machinery,  pp.  82-89. 

AIROLDI,  E.  M.,  Blei,  D.  M..  Fienberg,  S.  E.  &  XING.  E.  P.  (2008).  Mixed  membership  stochastic  blockmodels. 
J.  Mach.  Learn.  Res.  9,  1981-2014. 

BlCKEL,  P.  J.  &  CHEN,  A.  (2009).  A  nonparametric  view  of  network  models  and  Newman-Girvan  and  other  modu¬ 
larities.  Proc.  Natl  Acad.  Sci.  U.S.A.  106,  21068-21073. 

CHUNG.  F.  R.  K.  &  Lu,  L.  (2006).  Complex  Graphs  and  Networks.  Providence,  Rhode  Island:  American  Mathe¬ 
matical  Society. 

COPIC,  J.,  JACKSON,  M.  O.  &  KlRMAN,  A.  (2009).  Identifying  community  structures  from  network  data  via 
maximum  likelihood  methods.  Berk.  Electron.  J.  Theoret.  Econom.  9.  RePEc:bpj:bejtec:v:9:y:2009:i:  l:n:30. 
DOREIAN,  P.,  Batagelj,  V.  &  FERLIGOJ,  A.  (2005).  Generalized  Blockmodeling.  Cambridge,  U.K.:  Cambridge 
University  Press. 

DUBHASHI,  D.  P.  &  PANCONESI,  A.  (2009).  Concentration  of  Measure  for  the  Analysis  of  Randomized  Algorithms. 
Cambridge.  U.K.:  Cambridge  University  Press. 

FIENBERG,  S.  E.,  Meyer,  M.  M.  &  WASSERMAN,  S.  S.  (1985).  Statistical  analysis  of  multiple  sociometric 
relations.  J.  Am.  Statist.  Ass.  80,  51-67. 

GlRVAN,  M.  &  Newman,  M.  E.  J.  (2002).  Community  structure  in  social  and  biological  networks.  Proc.  Natl  Acad. 
Sci.  U.S.A.  99,  7821-7826. 

Goldenberg,  A.,  Zheng,  A.  X.,  Fienberg,  S.  E.  &  Airoldi,  E.  M.  (2009).  A  survey  of  statistical  network 
models.  Found.  Trend  Mach.  Learn.  2,  129-233. 

HANDCOCK,  M.  S.,  Raftery.  A.  E.  &  Tantrum.  J.  M.  (2007).  Model-based  clustering  for  social  networks.  J.  R. 
Statist.  Soc.  A  170,  301-354. 

Hoff,  P.  D.  (2008).  Modeling  homophily  and  stochastic  equivalence  in  symmetric  relational  data.  In  Advances  in 
Neural  Information  Processing  Systems ,  J.  C.  Platt,  D.  Roller,  Y.  Singer  &  S.  Roweis,  eds.,  vol.  20.  Cambridge, 
Massachusetts:  MIT  Press,  pp.  657-664. 

HOLLAND.  P.,  LASKEY,  K.  B.  &  Leinhardt,  S.  (1983).  Stochastic  blockmodels:  Some  first  steps.  Soc.  Netw.  5. 
109-137. 

KALLENBERG,  O.  (2005).  Probabilistic  Symmetries  and  Invariance  Principles.  New  York:  Springer. 

KARRER,  B.  &  Newman.  M.  E.  J.  (2011).  Stochastic  blockmodels  and  community  structure  in  networks.  Phys. 
Rev.  E  83,  016107-1-10. 

LORRAIN,  F.  &  WHITE.  H.  C.  (1971).  Structural  equivalence  of  individuals  in  social  networks.  J.  Math.  Sociol.  1, 
49-80. 

MARIADASSOU.  M.,  Robin,  S.  &  VACHER,  C.  (2010).  Uncovering  latent  structure  in  valued  graphs:  A  variational 
approach.  Ann.  Appl.  Statist.  4,  715-742. 

Newman.  M.  E.  J.  (2006).  Modularity  and  community  structure  in  networks.  Proc.  Natl  Acad.  Sci.  U.S.A.  103, 
8577-8582. 

NOWICKI,  K.  &  SNIJDERS,  T.  A.  B.  (2001).  Estimation  and  prediction  for  stochastic  blockstructures.  J.  Am.  Statist. 
Ass.  96,  1077-1087. 

Rohe.  K.,  Chatterjee,  S.  &  Yu.  B.  (2011).  Spectral  clustering  and  the  high-dimensional  stochastic  blockmodel. 
Ann.  Statist.  To  appear. 

SNIJDERS,  T.  A.  B.  &  NOWICKI,  K.  (1997).  Estimation  and  prediction  for  stochastic  blockmodels  for  graphs  with 
latent  block  structure.  J.  Classif.  14,  75-100. 

Traud,  A.  L.,  Kelsic.  E.  D.,  Mucha.  P.  J.  &  PORTER,  M.  A.  (2011).  Comparing  community  structure  to 
characteristics  in  online  collegiate  social  networks.  SIAM  Rev.  To  appear. 

WANG,  Y.  J.  &  WONG,  G.  Y.  (1987).  Stochastic  blockmodels  for  directed  graphs.  J.  Am.  Statist.  Ass.  82,  8-19. 

[Received  November  2010.  Revised  April  2011] 


